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Abstract: 

• We estimate the derivative of a probability density function defined on [0,oo). For 
this purpose, we choose the class of kernel estimators with asymmetric gamma kernel 
functions. The use of gamma kernels is fruitful due to the fact that they are non¬ 
negative, change their shape depending on the position on the semi-axis and possess 
good boundary properties for a wide class of densities. We find an optimal bandwidth 
of the kernel as a minimum of the mean integrated squared error by dependent data 
with strong mixing. This bandwidth differs from that proposed for the gamma kernel 
density estimation. To this end, we derive the covariance of derivatives of the density 
and deduce its upper bound. Finally, the obtained results are applied to the case of a 
first-order autoregressive process with strong mixing. The accuracy of the estimates 
is checked by a simulation study. The comparison of the proposed estimates based on 
independent and dependent data is provided. 
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1. INTRODUCTION 


Kernel density estimation is a non-parametric method to estimate a prob¬ 
ability density function (pdf) f(x). It was originally studied in [20], [22] for 
symmetric kernels and univariate independent identically distributed (i.i.d) data. 
When the support of the underlying pdf is unbounded, this approach performs 
well. If the pdf has a support on [0, oo), the use of classical estimation methods 
with symmetric kernels yield a large bias on the zero boundary and leads to a 
bad quality of the estimates [3D]. This is due to the fact that symmetric ker¬ 
nel estimators assign nonzero weight at the interval (—oo,0[. There are several 
methods to reduce the boundary bias effect, for example, the data reflection [25], 
boundary kernels [19J . the hybrid method [33], the local linear estimator [18] . m 
among others. Another approach is to use asymmetric kernels. In case of uni¬ 
variate nonnegative i.i.d random variables (r.v.s), the pdf estimators with gamma 
kernels were proposed in [8j. In [5] the gamma-kernel estimator was developed for 
univariate dependent data. The gamma kernel is nonnegative and it changes its 
shape depending on the position on the semi-axis. Estimators constructed with 
gamma kernels have no boundary bias if f"{ 0) = 0 holds, i.e when the underlying 
density f(x ) has a shoulder at x = 0 (see formula (4.3) in [31]). This shoulder 
property is fulfilled particularly for a wide exponential class of pdfs which satisfy 
important integral condition 

r oo 

(1.1) / x~ 1 / 2 f(x)dx < oo 

Jo 

assumed in [8]. In [31] the half normal and standard exponential pdfs are consid¬ 
ered as examples such that the boundary kernel K c {t) (p. 553 in pkQ) gives the 
better estimate than the gamma-kernel estimator considered in [8j. At the same 
time, the exponential distribution does not satisfy both the shoulder condition 
and the condition (11.11) . The half normal density satisfies the shoulder condition, 
but it does not satisfy m- Since (11.11) is not valid for the latter pdfs, such 
comparison is not appropriate. 

Alternative asymmetrical kernel estimators like inverse Gaussian and recip¬ 
rocal inverse Gaussian estimators were studied in [24]. The comparison of these 
asymmetric kernels with the gamma kernel is given in [6]. 

Along with the density estimation it is often necessary to estimate the 
derivative of a pdf. Derivative estimation is important in the exploration of 
structures in curves, comparison of regression curves, analysis of human growth 
data, mean shift clustering or hypothesis testing. The estimation of the den¬ 
sity derivative is required to estimate the logarithmic derivative of the density 
function. The latter has a practical importance in finance, actuary mathematics, 
climatology and signal processing. However, the problem of the density deriva¬ 
tive estimation has received less attention. It is due to a significant increasing 
complexity of calculations, especially for the multivariate case. The boundary 
bias problem for the multivariate pdf becomes more solid [4] . The pioneering pa- 
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Nonparametric gamma-kernel estimation of Maxwell density 
derivative function for sample size n=2000. The pdf derivative 
(solid line), the estimate with b (dotted gray line), the estimate 
with b% (dashed line). 


pers devoted to univariate symmetrical kernel density derivative estimation are 

m,m- 

The paper does not focus on the boundary performance but on finding of 
the optimal bandwidth that is appropriate for the pdf derivative estimation in 
case of dependent data satisfying a strong mixing condition. In (30j an optimal 
mean integrated squared error (MISE) of the kernel estimate of the first derivative 

_4 

of order n 7 was indicated. This corresponds to the optimal bandwidth of order 

n~7 for symmetrical kernels. The estimation of the univariate density derivative 

using a gamma kernel estimator by independent data was proposed in ED, m- 

This allows us to achieve the optimal MISE of the same order n 4// ' with a 
_ 2 . 

bandwidth of order n 7 . 


1.1. Contributions of this paper 


It is shown that in the case of dependent data, assuming strong mixing, we 
can estimate the derivative of the pdf using the same technique that has been 
applied for independent data in [IT] . Lemma l2Tl Section l2Tl contains the upper 
bound of the covariance. The mathematical technic applied for the derivative 
estimation is similar to one applied for the pdf. However, formulas became much 
more complicated, particulary because one has to deal with the special Digamma 
function that includes the bandwidth b. Thus, one has to pick out the order by b 
from complicated expressions containing logarithms and the special function. In 
Section [2.21 we find the optimal bandwidth b ~ ‘ which is different from the 

optimal bandwidth b\ ~ n~ 2 / 5 proposed for the pdf estimation (see (Sj, p. 476). 
In Fig. [Tjit is shown that the use of b\ to estimate the pdf derivative leads to a 
bad quality (for simplicity the i.i.d data were taken). We prove that the optimal 
MISE of the pdf derivative has the same rate of convergence to the true pdf 
derivative as for the independent case, namely 0(n -4// '). We show in Section T2.31 
that for the strong mixing autoregressive process of the first order (AR(1)) all 
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results are valid without additional conditions. In Section [3] a simulation study 
for i.i.d and dependent samples is performed. The flexibility of the gamma kernel 
allows us to fit accurately the multi-modal pdf derivatives. 


1.2. Practical motivation 


In practice it is often necessary to deal with sequences of observations that 
are derived from stationary processes satisfying the strong mixing condition. As 
an example of such processes one can take autoregressive processes like in Section 
12.31 Along with the evaluation of the density function and its derivative by 
dependent samples, the estimation of the logarithmic derivative of the density is 
an actual problem. The logarithmic pdf derivative is the ratio of the derivative of 
the pdf to the pdf itself. The pdf derivative estimation is necessary for an optimal 
filtering in the signal processing and control of nonlinear processes where only the 
exponential pdf class is used, [ ID] . Moreover, the pdf derivative gives information 
about the slope of the pdf curve, its local extremes, significant features in data 
and it is useful in regression analysis [9]. The pdf derivative also plays a key role 
in clustering via mode seeking [ 23] . 


1.3. Theoretical background 


Let {Xi\i = 1,2,...} be a strongly stationary sequence with an unknown 
probability density function f(x), which is defined on x € [0,oo). We assume 
that the sequence { X t } is a—mixing with coefficient 

a(z) = sup sup \P(A n B) — P(A)P(B)\. 
k 

Here, pf(X) is the a-field of events generated by {Xj,i < j < k} and a(i) —>• 0 
as i ^ oo. For these sequences we will use a notation {Xj}j> i E 5(a). Let 
fi(x, y) be a joint density of X\ and X\ + j, z = 1,2,.... 

Our objective is to estimate the derivative f'(x) by a known sequence of 
observations {A}}. We use the non-symmetric gamma kernel estimator that was 
defined in [8] by the formula 

1 n 

( 1 - 2 ) f n {x) = -Y J K Pb{xlh (X l ). 

2=1 


Here 

exp(—t/fr) 
bPb( x )T(p b (x)) 


(1.3) 
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is the kernel function, 6 is a smoothing parameter (bandwidth) such that 6 —>■ 0 
as n —> oo, r(-) is a standard gamma function and 


(1.4) 


f pi(x) = x/b, if x > 2b, 

1 P 2 {x) = (x/(2b)) 2 + 1, if x € [0, 2b). 


The use of gamma kernels is due to the fact that they are nonnegative, change 
their shape depending on the position on the semi-axis and possess better bound¬ 
ary bias than symmetrical kernels. The boundary bias becomes larger for mul¬ 
tivariate densities. Hence, to overcome this problem the gamma kernels were 
applied in [3j. Earlier the gamma kernels were only used for the density estima¬ 
tion of identically distributed sequences in 0,0 and for stationary sequences in 
0 - 


To our best knowledge, the gamma kernels have been applied to the density 
derivative estimation at first time in HU. In this paper the derivative f'(x) was 
estimated under the assumption that {X \, X 2 ,..., X n } are i.i.d random variables 
as derivative of m- This implies that 


(1.5) 

holds, where 
( 1 - 6 ) K' nm (f) = 


fX) = -IX W ™ 


2 — 1 


K 

I 

K 


(*),&(*) = if x ^ 26 ’ 

P 2 (x),b(ty = ij? K P 2 (x),b(t) L 2 (t), if X € [0,26), 


is the derivative of K p ( x ^ b (t), 

(1.7) Li (t) = L\[t, x) = lnt — ln6 — T(pi(x)), 

L 2 {t ) = L 2 (t, x) = lnt — ln6 — ty(p 2 (x)), 


Here T(x) denotes the Digamma function (the logarithmic derivative of the 
gamma function). The unknown smoothing parameter b was obtained as the 
minimum of the mean integrated squared error ( MISE) which, as known, is 
equal to 


MISE(fl(x )) = E j{f'{x)-f' n {x)) 2 dx. 
0 


Remark 1.1. The latter integral can be splitted into two integrals f^ b 
and In the case when x > 2b the integral f^ b tends to zero when b —> 0. 
Hence, we omit the consideration of this integral in contrast to [31]. The first 
integral has the same order by b as the second one, thus it cannot affect on the 
selection of the optimal bandwidth. 


The following theorem has been proved. 
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Theorem 1.1. fill/ 

Ifb—> 0 and nb 3 / 2 —>• oo as n —>• oo, the integrals 


OO OO 

I P(x)dx, / x~ 3/2 f(x)dx 


are Unite and J P{x)dx ^ 0, then the leading term of a MISE expansion of the 
o 

density derivative estimate f'(x) is equal to 

h 2 r°° 

(1.8) MISE{f! n {x)) = - / P(x)dx 

16 Jo 

[°° n^b-Wx- a/ 2 f (f{x) f{x)\\ 2 _ Uu _ 3/2 ^ 

l -v?- [m + l>{-^-—))dx + o(b +n (b /)). 


+ 

where 


P{x) = 


fix) 

3x 2 


+ fix) 


Taking the derivative of (11.81) in b leads to equation 


(1.9) 


fix) 

3x 2 


+ fix) dx - 


3 n l b 2 
8 fn 


x 2 f(x)dx 


+ 


— 1 7 —- 

n o 2 

160F 


-3 f fix) 


X 2 


— f\x) ) dx = 0. 


Neglecting the term with b 3 / 2 as compared to the term b 5 / 2 , the equation 
becomes simpler and its solution is equal to the optimal global bandwidth 


( 1 . 10 ) 


bo = 


3 ffx 3 / 2 f(x)dx 

k V^7o°° (© + / /, ( a; )) dx 


2/7 


n 


- 2/7 


The substitution of bo into (11.81) yields an optimal MISE with the rate of con- 

4 

vergence 0(n ~?). The unknown density and its second derivative in (jl.lOD were 
estimated by the rule of thumb method [12] . 


In [30] . p. 49, it was indicated an optimal MISE of the first derivative 

_ 4 . . _ 1 

kernel estimate n i with the bandwidth of order n t for symmetrical kernels. 

Nevertheless, our procedure achieves the same order n -4 / 7 with a bandwidth of 
_ 2 

order n ?. Moreover, our advantage concerns the reduction of the bias of the 
density derivative at the zero boundary by means of asymmetric kernels. Gamma 
kernels allow us to avoid boundary transformations which is especially important 
for multivariate cases. 


Further results presented in Section \2 .21 will be based on Theorem ll.il 
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2. Main Results 


2.1. Estimation of the density derivative by dependent data 

Here, we estimate the density derivative by means of the kernel estimator 
m by dependent data. Thus, its mean squared error is determined as 

(2.1) MSE(f' n (x )) = ( Bias(f' n (x ))) 2 + var(j' n (x)), 
where, due to the stationarity of the process Xj, the variance is given by 

var (f' n (x)) = var = ^var 

1 ( n 

= ^2 E ™r(K' b (X i )) + 2 E cav (K , b (X i ),K , b (X j )) 

\ 2=1 

= -var (K' b {X i )) + \ ]T cov^pQ), K'^Xj)) 
n n z z — J 

= -var(X'(X l )) + - E ( 1 " " ) covK^),^^)) 

lb lb \ /i / 

= V(x) + C(x). 

For simplicity we use here and further the notation b (t) = K' b (t ) in (11.51) . 

Thus, (12.11) can be written as 

(2.2) MSE(f'(x )) = B{xf + V{x) + C(x ), 
where 

B(x) = Bias(f' n {x)). 

The bias of the estimate does not change, but the variance contains a covariance. 
The next lemma is devoted to its finding. 

Lemma 2.1. Let 

OO 

1. {*,■}>>! € 5(a) and f a(r) v dr < oo, 0 < v < 1 hold, 

i 

2. f(x) he a twice continuously differentiable function, 

3. 6 —» 0 and nb~^ v+1 ^ 2 —» oo as n —>• oo. 
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Then the covariance C(x) is bounded by 


(2.3) |C(x)| = 


n —1 


i=l 


X 

n 


-E 1 - -) ^(K,(,)A x e,K l{xll (x w )) 


< 2 2 7T 2 X 2 


b 2 


1—V 


n 


b 2 C 2 (v,x ) + bCi(v,x) + C 3 (v,x) 


+ o(b 2 ) / a{r) v dT , 


where K' Pb ^ T ) is defined by (11.61) and Ci(u,x), C 2 (v,x) and C\(v,x ) are given by 

ra>. 


A similar lemma was proved in m for symmetrical kernels and not strictly 
positive x. 


2.2. Mean integrated squared error of f' n (x) 


Using the upper bound (12.31) we can obtain the upper bound of the MISE 
and find the expression of the optimal bandwidth b as the minimum of the latter. 


Theorem 2.1. If the conditions of Theorem o and Lemma I2.il hold, 
then the MISE expansion for the estimate f' n (x ) of the density derivative is equal 
to 


OC 

MISE(f'(x )) < J 


o 




2 V x 


dx 


00 / _v±l 

f / 0 _h +3 hzX _H+56 2 n / 

+ I I 2 2 7T 2 X 2 - C 3 [v,x / 


1—v 


n 


a(r) v dTdx 


0 


LXJ 

h 2 f 3 

(2.4) + — / P(x)dx + o(b 2 + n^ 1 ( 6 _ 2 )). 

16 J 


and the optimal bandwidth is b op t = o(n 2 /') and the MISE opt = 0(n 4 /'). 


Remark 2.1. It is evident from the formula (12.41) that the term respon- 

_ v+1 

sible for the covariance has the order - 2 , 0 < v < 1. Thus, it does not 

influence the order of MISE irrespective of the mixing coefficient a(r). 


The proof is given in Appendix [H 
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2.3. Example of a strong mixing process 


We use the first-order autoregressive process as an example of a process 
that satisfies Theorem 11.11 X t determines a first-order autoregressive (AR(1)) 
process with the innovation r.v. eo and the autoregressive parameter p € (—1,1) 
if 


(2.5) 


Xi — pX % _\ + €i, i — ... — 1,0,1,... , 


holds and e* is a sequence of i.i.d r.v.s Let AR(1) process (|2.5|) be strong mixing 
with mixing numbers o(r), r = 1,2,... 


(2.6) o(r) < o(t) 


2(c+i)E|wn^r, h t > to , 

1, if 1 < T < T 0 , 


where v = min{p, q, 1} and p > 0, q > 0, C > 0, tq > 0 hold. In [2] it was proved 
that with some conditions AR(1) is a strongly mixing process. 


In Appendix [4] we prove the following lemma. 


Lemma 2.2. Under the conditions (12.61) the AR(1) process (|2.5D satis¬ 
fies Lemma 12.11 and Theorem 12.11 


3. Simulation results 


To investigate the performance of the gamma-kernel estimator we select the 
following positive defined pdfs: the Maxwell (u = 2), the Weibull (a = 1,6 = 4) 
and the Gamma (a = 2.43, /3 = 1) pdf, 


Im(x) 
fw{x ) 
fc(x ) 


V^x 2 exp(— x 2 /2a 2 ) 
a 3 y/ir 

sx s_1 exp(— x s ), 
x Q_1 exp(— x/f3) 
(3 a T(a) ' 


Their derivatives 


(3.1) 


/m(z) 
fw ( x ) 
fc( x ) 


\/2xexp(— x 2 /2 o 2 )(x 2 — 2a 2 ) 
a^yfr 

—sx s ~ 2 exp(—x s )(sx s — s + 1), 
x a ~ 2 exp(— x//3)(/3 + x — aj3) 
^+ I TV) 


are to be estimated. The Weibull and the Gamma pdfs are frequently used in 
a wide range of applications in engineering, signal processing, medical research, 
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Figure 2: 


Estimates of the Maxwell pdf derivative by i.i.d data (left) and 
by dependent data (right): the /^(x) (black line), gamma ker¬ 
nel estimate from the rule of thumb (grey line) for the sample 
size n = 2000. 




Figure 3: 


Estimates of the Weibull pdf derivative by i.i.d data (left) and by 
dependent data (right): the /(^(x) (black line), gamma kernel 
estimate from the rule of thumb (grey line) for the sample size 
n = 2000. 


quality control, actuarial science and climatology among others. For example, 
most total insurance claim distributions are shaped like gamma pdfs [13] . The 
gamma distribution is also used to model rainfalls [T] . Gamma class pdfs, like 
Erlang and y 2 pdfs are widely used in modeling insurance portfolios [15] . 

We generate Maxwell, Weibull and Gamma i.i.d samples with sample sizes 
n € {100,500,1000,2000} using standard Matlab generators. To get the depen¬ 
dent data we generate Markov chains with the same stationary distributions using 
the Metropolis - Hastings algorithm m- Due to the existence of the probability 
of rejecting a move from the previous point to the next one, the variance of such 
Markov sequence {X t } is corrupted by the function of the latter rejecting prob¬ 
ability (see m, Theorem 3.1). The Metropolis-Hastings Markov chains [T6] are 
geometrically ergodic for the underlying light-tailed distributions. Hence, they 
satisfy the strong mixing condition m- 

The gamma kernel estimates (11.21) with the optimal bandwidth (11.101) for 
the derivatives (13.11) can be seen in Figures [2]- [4] The optimal bandwidth (11.101) 
is counted for every replication of the simulation using the rule of thumb method, 
where as a reference density we take the gamma pdf. The estimation error of 
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Figure 4: 

Estimates of the Gamma pdf derivative by i.i.d data (left) and 
by dependent data (right): the f' G (x) (black line), gamma kernel 
estimate from the rule of thumb (grey line) for the sample size 
n = 2000. 


the pdf derivative is calculated by the following formula 

OO 

m = j(f\x) - f\x)) 2 dx , 
o 

where f'{x ) is a true derivative and fix) is its estimate. Values of m's aver¬ 
aged over 500 simulated samples and the standard deviations for the underlying 
distributions are given in Table [T] for i.i.d r.v.s and in Table [2] for dependent 
data. As expected, the mean error and the standard deviation decrease when 


11 

100 

500 

1000 

2000 

Gamma 

0.032792 

(0.011967) 

0.015208 

(0.0044094) 

0.010675 

(0.0027815) 

0.0074668 

(0.0016452) 

Weibull 

2.0056 

(0.52931) 

1.1987 

(0.25172) 

0.9157 

(0.18333) 

0.69155 

(0.12178) 

Maxwell 

0.0077597 

(0.0033915) 

0.0035692 

(0.0015351) 

0.0028675 

(0.00099263) 

0.0020923 

(0.00068739) 


Table 1: 


Mean errors m and standard deviations for i.i.d r.v.s 


11 

100 

500 

1000 

2000 

Gamma 

0.039226 

(0.015824) 

0.018124 

(0.006055) 

0.01252 

(0.0038485) 

0.0086675 

(0.0023361) 

Weibull 

2.2052 

(1.1585) 

1.3009 

(0.5957) 

0.97509 

(0.41041) 

0.75382 

(0.28755) 

Maxwell 

0.0077694 

(0.006793) 

0.0039277 

(0.0028336) 

0.002878 

(0.0020021) 

0.0027313 

(0.0016573) 


Mean errors m and standard deviations for strong mixed r.v.s 

the sample size rises, and this holds both for i.i.d and the dependent case. The 
performance of the gamma kernel changes when dependence is introduced, but 
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Figure 5: 


Gamma-kernel estimates of the pdf of the AR model with the 
Gamma noise and p £ {0.1,0.2,0.3,0.4} for the sample size 


n = 2000. 


the results in both tables are close. The mean errors are very close due to the 
fact the bandwidth parameter is selected to minimize this error. However, the 
standard deviations for the dependent data are higher than for the i.i.d r.v.s. 
For example, for the sample size of 500 the mean errors and the standard devi¬ 
ations for the Maxwell pdf for the i.i.d r.v.s are 0.0035692 (0.0015351) and for 
dependent r.v.s 0.0039277 (0.0028336). They differ due to the contribution of the 
Metropolis-Hastings rejecting probability. This difference is less pronounced for 
larger sample sizes. 

The Metropolis-Hastings algorithm gives opportunity to generate AR pro¬ 
cesses with known pdfs. As a consequence we know their derivatives and can 
find mean errors and standard deviations of the gamma-kernel density deriva¬ 
tives estimates for the dependent data. In the case when we consider the noise 
distribution {e} of the AR model (12.51) and the autoregressive parameter p that 
influences on the dependence rate (12.61) . we cannot indicate in general the true pdf 
of the process. Hence, we consider the histogram based on 200000 observations 
as a true pdf. As the noise distribution {e} let us take the Gamma distribution 
(a = 1.5, j5 = 1) and the Maxwell distribution (a = 1). In [5] it was proved 
that, as in the i.i.d case, the gamma-kernel estimator of the pdf achieves the 
same optimal rate of convergence in terms of the mean integrated squared error 
as for strongly mixed r.v.s. For the various parameters p £ {0.1,0.2, 0.3,0.4} the 
gamma estimates for the densities of the AR models are given in Figures [5EJ 
Since the gamma-kernel estimators perform good for the various dependence rates 
it is also true for the gamma-kernel pdf derivative estimators, but the bandwidth 
parameter must be selected differently. 

Hence, this findings confirms the fact that the covariance term (12.31) of the 
pdf derivative is negligible in comparison with its variance and implies that one 
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Figure 6: 


Gamma-kernel estimates of the pdf of the AR model with the 
Maxwell noise and p £ {0.1,0.2,0.3,0.4} for the sample size 


n = 2000. 


can use the same optimal bandwidth (11.101) . both for independent and strongly 
mixed dependent data. 
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4. APPENDIX 


Proof of Lemma 12.11 Taking an integral from (12.21) we get 


(4.1) 


where 


MISE(f'(x )) 


oo 

J ( B(x ) 2 + V{x) + C(x))dx, 
0 


(4.2) 


C(x) 


n—1 , . x 

- E 1 - - COv^^),^^)). 

n , =1 \ nj 
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To evaluate the covariance we shall apply Davydov’s inequality 

(4.3) | cov{I\' b (Xl), K' b {X\ + i ))| < 2|| ||,|| K’ b (X 1+i ) || p , 

where p~ l + q + r ~ 1 = 1,1 < p,q,r < oo, [3J. 

The latter norm for the case x > 26 is determined by 

(4.4) || Kf Xc, || a = ( j (\K(y)L 1 (y)y f(y)dy] " 

= i(E(A'«i) s - 1 i 1 « 1 )«/({i))) 1/5 . 

where L\(t) is introduced in (11.71) . The kernel K (£ 1 ) was used in (14.41) as a density 
function and £1 is a Gamma(pi(x),b) random variable. 

In the case x € [0, 26), similarly we have 

(4.5) || K' b (X j) ||, = ^K(y)L 2 (y)y f(y)d y y q 

= ^(E^r^^/te))) 175 , 

where L 2 (t ) is determined by (11.71) . and £2 is a Gamma(p 2 (x), 6 ) random variable. 
Expressions (I4.4j) and (|4.5I) are constructed similarly, thus to a certain point, we 
will not make differences between them. 

By the standard theory of the gamma distribution it is known that p = 
E(£) = pb(x)b and the variance is given by var {£) = p b (x)b 2 . For simplicity, we 
further use the notation p instead of pb(x) defined in (11.41) . 

The Taylor expansion of both mathematical expectations in (14.41) . (14.51) in 
the neighborhood of p is represented by 

e (Awugm)) = Aw<-'iw</w + (AKruioviorif-cEK - & 

+ (A'«r‘UOVK))" ls-,. E({ ~'‘ )2 + o (E« - y ) 2 ). 

In the case when x > 26, p = pb = x, var(£) = pb 2 = xb, we get 

E (K^y-'L^ynO) = A(3 f 1 (qL(x) q+1 f' (*) - L(x) q f(x)L'(x) 

— L(x) q+1 f (x) + bL(x) q f" (x) + q 2 L(x) q f(x)L'(x) + bq 2 L(x) q ~ 2 (L' (x)) 2 f (x) 
+ 2bqL(x) q ~ 1 L’ ( x ) f' (x) + bqL(x) q ~ l f {x)L" (x) — bqL(x) g ~ 2 (L 1 (x)) 2 ^ 

+ K {x)q lA 2 X)(g ~ 1} ^(g - 1 )f(x)L(x) q+1 + bL(x) q f’(x) 

+ bqL{x) q ~ l f {x)L' (x)^ +o( 6 2 ). 
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Using Stirling’s formula 


2n (z\ z 


1 + 0 


T(z) = 

we can rewrite the kernel function as 

K = tP- 1 exp (-t/b) = tP- 1 exp(-t/ 6 ) exp(p) 
bPT(p) bP\f2TipP-?{l + 0(1/p)) 

Taking p = p\(x) according to (11.41) . t = x. it holds 

1 X x / b ~ 1 exp((x — x)/b) x 2 b 2 




\/27r 5 !|f ~2 (1 + 0 ( 6 /x)) + 0(b/x)) 


Hence, its upper bound is given by 


(4.6) 


K(x) < 


1 

V^TTxb 


Next, using the property of the Digamma function T(x) = ln(x') — X — + 

+ 0(l/x 6 ), the hrst equation in (11.71) can de rewritten as 

(4.7) +i(pi 6 ) = ln(/ 3 i b) - In( 6 ) - ^(pi) = ^ + ^2 + 0 ( fc2 )- 

Then substituting ()4.6I) in (14.41) and using the expressions (|4.6I) and (14.71) . we 
deduce 


Kb{X 1 ) || 9 < vr 2 / (2x) *6 2 / I b 2 C 2 (q,x) + bC 1 {q,x ) + 0 3 (g,x) ] +o( 6 2 ), 


1/9 


where we used the notations 

2q 3 - 9g 2 + 4q - 33 


(4.8) Oi(<?,x) = -/(x)- 


24x 


w/ \ Q + 1 , /*/// \ ^ 

-/(®)— 2 ~ + / O)^. 


0 2 (g,x) = /(x) 


2g + 54x — g 2 x + 21 <j 3 x + q 4 x + 93 qx 


144x 3 


- f'( x ) 

0 3 (<?,x) = -f(x) 


//'_\(9 + 1 ) 5 


+ r (*) 


q + 1 


12 x J v ' 12 
(^ + !)((/ - 2 ) 


The same steps can be done for || K' b (Xi + i ) || p from (14.3|) . Then, if p = q holds, 
one can represent Davydov’s inequality (14.31) as 


(4.9) | CO uK(X 1 ),i^(X 1+i ))| < 

lq lq lq / \ 2 /9 

< 2na(i)rTr~ (2x)~^ 2 b~ ( b 2 C 2 (q, x) + bC\(q, x) + C 3 (q, x) J +o(b 2 ). 
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Using (14.91) and taking p = q = 2 + 5, r = it can be deduced that the 
covariance (14.21) is given by 


\C(x)\ = 


n— 1 


i= 1 


-E )cov{K' b {X 1 ),K' b {X 1+i )) 

n r - ' V n 


< 


s _ 5+1 

2(5+3 1 3(5+5 0 5+2 

2 5+2 tt 6 + 2 X 5 + 2 - 


2 N 
2+5 


n 


6 2 C 2 (5,x) + 6+i(<5,x) + + 3 ( ( 5,x) 


n—1 


2—1 


E 


+ o(6 2 ). 


Then we can estimate the covariance by the previous expressions 

n / i \ 

|C(x)| < S(b , x, 5, n) ^2 f 1 — —-J ce(r — 1)2+6 + o( 6 2 ) 


r=2 

oo 


< 


5(6, x, <5, n) a(r — 1) 2 +<s + o( 6 2 ) < 5(6, x, <5, ?r) f a(r) 2 + s dr + o(b 2 ), 

r =2 , 


where we used the following notation 


. 6+1 


26+3 1 36+5 h <5+2 

S(b,x,5,n ) = 2 '5+2 7 t< 5 +2 x 6+2 


2 

2+6 


n 


6 2 C 2 (<5,x) + 6+ 1 (,5,x) + C 3 (<5,x) 


Let us denote = 

v, 0 < v 

of the covariance 


|+(x)| < 

/ 

u+i 

/ i;+3 1 — v 

u+5 0 2 

< 2 2 vr 2 x 

2 

V 

n 


1—l> \ oo 

> 2 \ 


6 C , 1 (u,x) + + 3 (u,x) + o( 6 J ) a(r) v dT. 


By 0 < u < 1 then it follows 


\C{x)\ ~ 


1 f+i 

—6 2 

n 


Remark 4.1. The main contribution to MISE ()4.1j) is provided by the 
part corresponding to x > 26, so we will not do similar calculations here and 
further for x € [ 0 , 26) as 6 —» 0 . 


□ 


Proof of Theorem I2.lt Regarding the dependent case it is known that 
the MISE contains the bias, the variance and the covariance. By (jl. 8 l) it follows 
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that the integrated sum of the squared bias and variance is the following expres¬ 
sion 


° 2 °° 

{B(x) 2 + V(x))dx = -^7 j P(x)dx 


(4.10) + 


— I 7 — _ _£ 

n 0 2 x 2 
4 y/ir 


/(*) + 


b ( f(x) 


2 V x 


~ f'i x ) ) ) dx + o(b 2 + n 1 b 2). 


This corresponds to the independent case. 

By integration of (12.31) we get the upper bound of the integrated covariance 


v-\-l 


^+3 1 — v v- 


+5 b 2 


(4.11)/ C{x)dx < / 2 2 vr 2 x 2 - Cs(v, x ) 1 v + o(b 2 ) / a{r) v dTdx 


n 


0 0 x 

Combining (14.101) and ()4.11l) . one can write 




f n ^b ^/ 2 x 2l l 2 

MISE(f'(x )) < y 

OC 

/ 


/(*) + 


b ( }{x) 


2 \ x 


- /'(x) dx 


00 v+l 

u +3 l-i> -u +5 b 2 


+ / 2 2 7T 2 X 2 


63 ( 1 ;, x ) 1 v dx / a(r) v dr 


00 

b 2 f 5 

+ — / P(x)dx + o( 6 2 + n _ 1 6 ~ 2 ). 

16 .7 


The derivative of this expression in b leads to 


- / P{x)dx — 


3n 2 
8 ^ 


OO 

J x _ 2 f(x)da 


(4.12) + 


n 1 b 2 

16 -^/tF 


x 2 


/(*) 


- /'(a:) I dx 


0 


00 u +3 

V+\ v±3 1-v v+S b 2 

-2 2 7T 2 X 2 - 


/ 

0 


n 


OO 

C , 3(i;,x) 1-t ’dx J a{r) v dr = 0 . 


Since 0 < u < 1 holds as in Lemma EU the third term in (14.121) by b has the 
worst rate 


■u+3 


c\b 2 = O (b 2 


where ci is a constant. 
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o j c\ u ~ I - 3 

Neglecting terms with b~ 6 ' z and b ~ in comparison to the term containing 
6 ” 5 / * 2 , we simplify the equation 


b 7 / 2 


j P(x)dx — J x *f(x)dx + o( 6 7 / 2 ) = 0 . 


0 0 

The optimal b = o(n -2 / 7 ) is the same as in (11.101) . Let us insert such b in (12.41) 


(4.13) MISE opt (f'{x)) = 


P(x)n 7 4 , 

- Trdx + 

16 


OO 

/ 


n - 4 / 7 T - 3 / 7 x - 3/2 

4 a/tt 


f(x)dx 


+ 


+ 


OO 

/ 

0 

OO 

/ 


-6/7 T - 1 /7 x - 3/2 / /(x) 


8-v/tt 




- /'(x) dx 


_ i>+3 1 — n+5 1 ' 

2 2 7 T 2 X 2 


b — 

n 7 


OO 

63 ( 1 ), x) 1_v dx J a{r) v dr , 


where 


T = 


3 J 0 °°x 3 // 2 /(x)<ix 


{& 

v—6 


+ /"(x)^ dx 


The last term in (14.131) has the rate o(n 7 ). By 0 < v < 1 we get that the optimal 
rate of convergence of MISE is given by MISE opt (f' (x)) = 0{n ~ 4 / 7 ). □ 


Proof of Lemma 12.21 We have to prove that ct(r) defined by (12.61) 
satisfies the conditions of Lemma f2.ll Conditions 2 and 3 of Lemma f2 .1 1 only refer 
to the density distribution. Thus, we remain to check only the first condition of 
Lemma O 


To this end, using (|2.6|) we get 

TQ OO 


(4.14) J a{r) v dr < J dr + j (2(C + 1 )E|X; \ u \p v \ T ) v dr 

1 1 TO 

OO 

= to — 1 + (2(C + 1)E|4Q|T/ (\pTTdr. 

TO 

The integral in (14.141) can be taken in general as 

OO 

\p u \ TV 


( \p v n v dT = 


v ln( 1 / 3 ^ |) 


TO 


TO 


Thus, to satisfy the first condition of Lemma 12. 11 it must be 


(4.15) 


1 / 


< 00 . 


T—OO 
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Since p € (—1,1) holds, it follows \p\ € [0,1). For p = 0 ([4.151) is satisfied. For 
\p\ € (0,1) one can rewrite ([4.151) as 


(?r 


< oo, 

T =OO 


e> i, 


which is valid as vv > 0. The latter is true since 0 < v < 1 and v = min{p, q, 1} > 
0. Thus, the strong mixing AR(1) process ([2.51) satisfies Lemma 12.11 Hence, it 
satisfies the conditions of Theorem 12.11 □ 
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